518 research outputs found

    Employing MPI Collectives for Timing Analysis on Embedded Multi-Cores

    Get PDF
    Static WCET analysis of parallel programs running on shared-memory multicores suffers from high pessimism. Instead, distributed memory platforms which communicate via messages may be one solution for manycore systems. Message Passing Interface (MPI) is a standard for communication on these platforms. We show how its concept of collective operations can be employed for timing analysis. The idea is that the worst-case execution time (WCET) of a parallel program may be estimated by adding the WCET estimates of sequential program parts to the WCET estimates of communication parts. Therefore, we first analyse the two MPI operations MPI_Allreduce and MPI_Sendrecv. Employing these results, we make a timing analysis of the conjugate gradient (CG) benchmark from the NAS parallel benchmark suite

    Redundant dataflow applications on clustered manycore architectures

    Get PDF
    Increasing performance requirements in the embedded systems domain have encouraged a drift from singlecore to multicore processors. Cars are an example for complex embedded systems in which the use of multicores continues to grow. The requirements of software components running in modern cars are diverse. On the one hand there are safety-critical tasks like the airbag control, on the other hand tasks which do not have any safety-related requirements at all, for example those controlling the infotainment system. Trends like autonomous driving lead to tasks which are simultaneously safety-critical and computationally complex. To satisfy the requirements of modern embedded applications we developed a dataflow-based runtime environment (RTE) for clustered manycore architectures. The RTE is able to execute dataflow graphs in various redundancy configurations and with different schedulers. We implemented our RTE design on the Kalray Bostan Massively Parallel Processor Array and evaluated all possible configurations for three common computation tasks. To classify the performance of our RTE, we compared the non-redundant graph executions with OpenCL versions of the three applications. The results show that our RTE can come close or even surpass Kalray's OpenCL framework, although maximum performance was not the primary goal of our design

    Beyond Dataflow

    Get PDF
    This paper presents some recent advanced dataflow architectures. While the dataflow concept offers the potential of high performance, the performance of an actual dataflow implementation can be restricted by a limited number of functional units, limited memory bandwidth, and the need to associatively match pending operations with available functional units. Since the early 1970s, there have been significant developments in both fundamental research and practical realizations of dataflow models of computation. In particular, there has been active research and development in multithreaded architectures that evolved from the dataflow model. Also some other techniques for combining control-flow and dataflow emerged, such as coarse-grain dataflow, dataflow with complex machine operations, RISC dataflow, and micro dataflow. These developments have also had certain impact on the conception of highperformance superscalar processors in the “post-RISC” era

    An artificial hormone system for self-organization of networked nodes

    Get PDF
    The rising complexity of distributed computer systems give reason to investigate self-organization mechanism to build systems that are self-managing in the sense of Autonomic and Organic Computing. In this paper we propose the Artificial Hormone System (AHS) as a general approach to build self-organizing systems based on networked nodes. The Artificial Hormone System implements a similar information exchange between networked nodes like the human hormone system does between cells. The artificial hormone values are piggy-backed on messages to minimize communication overhead. To show the efficiency of the mechanism even for large scale systems we implemented a simulation environment in Java to evaluate different optimization strategies. The evaluations show that local information is enough to meet global optimization criterion.1st IFIP International Conference on Biologically Inspired Cooperative Computing - Biological Inspiration 2Red de Universidades con Carreras en Informática (RedUNCI
    • …
    corecore